NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

نویسندگان

چکیده

Data augmentation is an important method for evaluating the robustness of and enhancing diversity training data natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based (NL) framework which supports creation transformations (modifications to data) filters (data splits according specific features). We describe initial set 117 23 variety NL tasks annotated with noisy descriptive tags. The incorporate noise, intentional accidental human mistakes, socio-linguistic variation, semantically-valid style, syntax changes, as well artificial constructs that are unambiguous humans. demonstrate efficacy NL-Augmenter by using its analyze popular find different models be differently challenged on tasks, quasi-systematic score decreases. infrastructure, datacards, evaluation results publicly available GitHub benefit researchers working paraphrase generation, analysis, low-resource NLP. El aumento de datos es un método importante para evaluar la solidez y mejorar diversidad del entrenamiento modelos procesamiento lenguaje (NLP). इस लेख में, हम एनएल-ऑगमेंटर का प्रस्ताव करते हैं - एक नया भागी- दारी पूर्वक, पायथन में बनाया गया, लैंग्वेज (एनएल) ऑग्मेंटेशन फ्रेमवर्क जो ट्रांसफॉर्मेशन (डेटा बदलाव करना) और फीलटर (फीचर्स के अनुसार डेटा भाग नीरमान समर्थन करता है।. 我们描述了NL-Augmenter框架及其初步包含的117种转换和23个过滤器,并 大致标注分类了一系列可适配的自然语言任务. این دگرگونی ها شامل نویز، اشتباهات عمدی و تصادفی انسانی، تنوع اجتماعی-زبانی، سبک معنایی معتبر، تغییرات نحوی همچنین ساختارهای مصنوعی است که برای انسان مبهم است. NL-Augmenterpa allin kaynintam qawachiyku, tikrakuyninku- nata servichikuspayku, chaywanmi qawariyku nisqapa takyasqa kayninta. Kami menemukan model yang berbeda ditantang secara pada tugas berbeda, dengan penurunan skor kuasi-sistematis. Infrastruktur, kartu data, dan hasil evaluasi ketahanan dipublikasikan tersedia gratis di untuk kepentingan peneliti mengerjakan pembuatan parafrase, analisis ketahanan, NLP sumber daya rendah.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NL Assistant: A Toolkit for Developing Natural Language: Applications

We will be demonstrating a toolkit for developing natural language-based applications and two applications. The goals of this toolkit are to reduce development time and cost for natural language based applications by reducing the amount of linguistic and programming work needed. Linguistic work has been reduced by integrating large-scale linguistics resources--Comlex (Grishman, et. al., 1993) a...

متن کامل

A Knowledge Framework for Natural Language Analysis

Abs t rac t Recent research in language analysis and language generation has highlighted the role of knowledge representation in both processes. Certain knowledge representation foundations, such as structured inheritance networks and feature-based linguistic representations, have proved useful in a variety of language processing tasks. Augmentations to this common framework, however, are requi...

متن کامل

A Framework for Integrating Natural Language Tools

Natural Language processing (NLP) systems are typically characterized by a pipeline architecture in which several independently developed NLP tools, connected as a chain of filters, apply successive transformations to the data that flows through the system. Hence when integrating such tools, one may face problems that lead to information losses, such as: (i) tools discard information from their...

متن کامل

Real-Time Natural Language Generation in NL-SOAR

NL-Soar is a computer system that performs language comprehension and generation within the framework of the Soar architecture [New90]. NL-Soar provides language capabilities for systems working in a real-time environment. Responding in real time to changing situations requires a flexible way to shift control between language and task operations. To provide this flexibility, NL-Soar organizes g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Northern European Journal of Language Technology

سال: 2023

ISSN: ['2000-1533']

DOI: https://doi.org/10.3384/nejlt.2000-1533.2023.4725